Two Step CCA: A new spectral method for estimating vector models of words
نویسندگان
چکیده
Unlabeled data is often used to learn representations which can be used to supplement baseline features in a supervised learner. For example, for text applications where the words lie in a very high dimensional space (the size of the vocabulary), one can learn a low rank “dictionary” by an eigendecomposition of the word co-occurrence matrix (e.g. using PCA or CCA). In this paper, we present a new spectral method based on CCA to learn an eigenword dictionary. Our improved procedure computes two set of CCAs, the first one between the left and right contexts of the given word and the second one between the projections resulting from this CCA and the word itself. We prove theoretically that this two-step procedure has lower sample complexity than the simple single step procedure and also illustrate the empirical efficacy of our approach and the richness of representations learned by our Two Step CCA (TSCCA) procedure on the tasks of POS tagging and sentiment classification.
منابع مشابه
Using CCA to improve CCA: A new spectral method for estimating vector models of words
Unlabeled data is often used to learn representations which can be used to supplement baseline features in a supervised learner. For example, for text applications where the words lie in a very high dimensional space (the size of the vocabulary), one can learn a low rank “dictionary” by an eigendecomposition of the word co-occurrence matrix (e.g. using PCA or CCA). In this paper, we present a n...
متن کاملAutomatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems
With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...
متن کاملSPOT-5 Spectral and Textural Data Fusion for Forest Mean Age and Height Estimation
Precise estimation of the forest structural parameters supports decision makers for sustainable management of the forests. Moreover, timber volume estimation and consequently the economic value of a forest can be derived based on the structural parameter quantization. Mean age and height of the trees are two important parameters for estimating the productivity of the plantations. This research ...
متن کاملAnalysis of Vector Estimating Modulation Method to Eliminate Common Mode Voltage
Abstract The problem of common mode voltage in inverters can be considered as a major issue which leads to motor bearing failures. To eliminate these voltages, proposing some methods seems to be necessary. This paper has a comparative study on estimating modulation methods of eliminating common mode voltage. The main idea of these methods is based on generation of reference vector with nearest ...
متن کاملVolumetric soil moisture estimation using Sentinel 1 and 2 satellite images
Surface soil moisture is an important variable that plays a crucial role in the management of water and soil resources. Estimating this parameter is one of the important applications of remote sensing. One of the remote sensing techniques for precise estimation of this parameter is data-driven models. In this study, volumetric soil moisture content was estimated using data-driven models, suppor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012